Recent advances in large language models (LLMs) have demonstrated remarkable successes in zero- and few-shot performance on various downstream tasks, paving the way for applications in high-stakes domains. In this study, we systematically examine the capabilities and limitations of LLMs, specifically GPT-3.5 and ChatGPT, in performing zero-shot medical evidence summarization across six clinical domains. We conduct both automatic and human evaluations, covering several dimensions of summary quality. Our study demonstrates that automatic metrics often do not strongly correlate with the quality of summaries. Furthermore, informed by our human evaluations, we define a terminology of error types for medical evidence summarization. Our findings reveal that LLMs could be susceptible to generating factually inconsistent summaries and making overly convincing or uncertain statements, leading to potential harm due to misinformation. Moreover, we find that models struggle to identify the salient information and are more error-prone when summarizing over longer textual contexts.
- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources5
- Resource Type
-
30020
- Availability
-
50
- Author / Contributor
- Filter by Author / Creator
-
-
Ding, Ying (5)
-
Peng, Yifan (5)
-
Rousseau, Justin F. (5)
-
Jaiswal, Ajay (3)
-
Tang, Liyan (2)
-
Wang, Zhangyang (2)
-
Ashutosh, Kumar (1)
-
Chen, Tianlong (1)
-
Durrett, Greg (1)
-
Elias, Pierre A. (1)
-
Han, Yan (1)
-
Idnay, Betina (1)
-
Li, Tianhao (1)
-
Majety, Akash (1)
-
Nestor, Jordan G. (1)
-
Shih, George (1)
-
Soroush, Ali (1)
-
Sun, Zhaoyi (1)
-
Wang, Song (1)
-
Weng, Chunhua (1)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
- (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract -
Jaiswal, Ajay ; Chen, Tianlong ; Rousseau, Justin F. ; Peng, Yifan ; Ding, Ying ; Wang, Zhangyang ( , 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV))
-
Jaiswal, Ajay ; Ashutosh, Kumar ; Rousseau, Justin F. ; Peng, Yifan ; Wang, Zhangyang ; Ding, Ying ( , 2022 IEEE International Conference on Data Mining (ICDM))
-
Wang, Song ; Tang, Liyan ; Majety, Akash ; Rousseau, Justin F. ; Shih, George ; Ding, Ying ; Peng, Yifan ( , Journal of Biomedical Informatics)
-
Jaiswal, Ajay ; Li, Tianhao ; Zander, Cyprian ; Han, Yan ; Rousseau, Justin F. ; Peng, Yifan ; Ding, Ying ( , IEEE Conference Proceedings)